AITopics | visual image

Collaborating Authors

visual image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLIP in Mirror: Disentangling text from visual images through reflection

Neural Information Processing SystemsMay-26-2025, 19:52:02 GMT

The CLIP network excels in various tasks, but struggles with text-visual images i.e., images that contain both text and visual objects; it risks confusing textual and visual representations. To address this issue, we propose MirrorCLIP, a zero-shot framework, which disentangles the image features of CLIP by exploiting the difference in the mirror effect between visual objects and text in the images. Specifically, MirrorCLIP takes both original and flipped images as inputs, comparing their features dimension-wise in the latent space to generate disentangling masks. With disentangling masks, we further design filters to separate textual and visual factors more precisely, and then get disentangled representations. Moreover, our proposed MirrorCLIP reduces confusion when encountering text-visual images and achieves a substantial improvement on typographic defense, further demonstrating its superior ability of disentanglement.

artificial intelligence, disentangling text, machine learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

A Case Study on Visual-Audio-Tactile Cross-Modal Retrieval

Wojcik, Jagoda, Jiang, Jiaqi, Wu, Jiacheng, Luo, Shan

arXiv.org Artificial IntelligenceJul-30-2024

Cross-Modal Retrieval (CMR), which retrieves relevant items from one modality (e.g., audio) given a query in another modality (e.g., visual), has undergone significant advancements in recent years. This capability is crucial for robots to integrate and interpret information across diverse sensory inputs. However, the retrieval space in existing robotic CMR approaches often consists of only one modality, which limits the robot's performance. In this paper, we propose a novel CMR model that incorporates three different modalities, i.e., visual, audio and tactile, for enhanced multi-modal object retrieval, named as VAT-CMR. In this model, multi-modal representations are first fused to provide a holistic view of object features. To mitigate the semantic gaps between representations of different modalities, a dominant modality is then selected during the classification training phase to improve the distinctiveness of the representations, so as to improve the retrieval performance. To evaluate our proposed approach, we conducted a case study and the results demonstrate that our VAT-CMR model surpasses competing approaches. Further, our proposed dominant modality selection significantly enhances cross-retrieval accuracy.

modality, representation, retrieval, (14 more...)

arXiv.org Artificial Intelligence

2407.20709

Country: Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Is plantar thermography a valid digital biomarker for characterising diabetic foot ulceration risk?

Jagadeesh, Akshay, Aramrat, Chanchanok, Nur, Aqsha, Mallinson, Poppy, Kinra, Sanjay

arXiv.org Artificial IntelligenceJul-5-2024

Background: In the absence of prospective data on diabetic foot ulcers (DFU), cross-sectional associations with causal risk factors (peripheral neuropathy, and peripheral arterial disease (PAD)) could be used to establish the validity of plantar thermography for DFU risk stratification. Methods: First, we investigated the associations between the intrinsic clusters of plantar thermographic images with several DFU risk factors using an unsupervised deep-learning framework. We then studied associations between obtained thermography clusters and DFU risk factors. Second, to identify those associations with predictive power, we used supervised learning to train Convolutional Neural Network (CNN) regression/classification models that predicted the risk factor based on the thermograph (and visual) input. Findings: Our dataset comprised 282 thermographs from type 2 diabetes mellitus patients (aged 56.31 +- 9.18 years, 51.42 % males). On clustering, we found two overlapping clusters (silhouette score = 0.10, indicating weak separation). There was strong evidence for associations between assigned clusters and several factors related to diabetic foot ulceration such as peripheral neuropathy, PAD, number of diabetes complications, and composite DFU risk prediction scores such as Martins-Mendes, PODUS-2020, and SIGN. However, models predicting said risk factors had poor performances. Interpretation: The strong associations between intrinsic thermography clusters and several DFU risk factors support the validity of using thermography for characterising DFU risk. However, obtained associations did not prove to be predictive, likely due to, spectrum bias, or because thermography and classical risk factors characterise incompletely overlapping portions of the DFU risk construct. Our findings highlight the challenges in standardising ground truths when defining novel digital biomarkers.

dfu risk factor, risk factor, thermography, (15 more...)

arXiv.org Artificial Intelligence

2407.04676

Country:

Europe > United Kingdom (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > India > Telangana (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Boundary of GPT-4V on Marine Analysis: A Preliminary Case Study

Zheng, Ziqiang, Chen, Yiwei, Zhang, Jipeng, Vu, Tuan-Anh, Zeng, Huimin, Tim, Yue Him Wong, Yeung, Sai-Kit

arXiv.org Artificial IntelligenceJan-4-2024

Large language models (LLMs) have demonstrated a powerful ability to answer various queries as a general-purpose assistant. The continuous multi-modal large language models (MLLM) empower LLMs with the ability to perceive visual signals. The launch of GPT-4 (Generative Pre-trained Transformers) has generated significant interest in the research communities. GPT-4V(ison) has demonstrated significant power in both academia and industry fields, as a focal point in a new artificial intelligence generation. Though significant success was achieved by GPT-4V, exploring MLLMs in domain-specific analysis (e.g., marine analysis) that required domain-specific knowledge and expertise has gained less attention. In this study, we carry out the preliminary and comprehensive case study of utilizing GPT-4V for marine analysis. This report conducts a systematic evaluation of existing GPT-4V, assessing the performance of GPT-4V on marine research and also setting a new standard for future developments in MLLMs. The experimental results of GPT-4V show that the responses generated by GPT-4V are still far away from satisfying the domain-specific requirements of the marine professions. All images and prompts used in this study will be available at https://github.com/hkust-vgd/Marine_GPT-4V_Eval

arxiv preprint arxiv, filename, gpt-4v, (13 more...)

arXiv.org Artificial Intelligence

2401.02147

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Self-Supervised Visuo-Tactile Pretraining to Locate and Follow Garment Features

Kerr, Justin, Huang, Huang, Wilcox, Albert, Hoque, Ryan, Ichnowski, Jeffrey, Calandra, Roberto, Goldberg, Ken

arXiv.org Artificial IntelligenceJul-31-2023

Humans make extensive use of vision and touch as complementary senses, with vision providing global information about the scene and touch measuring local information during manipulation without suffering from occlusions. While prior work demonstrates the efficacy of tactile sensing for precise manipulation of deformables, they typically rely on supervised, human-labeled datasets. We propose Self-Supervised Visuo-Tactile Pretraining (SSVTP), a framework for learning multi-task visuo-tactile representations in a self-supervised manner through cross-modal supervision. We design a mechanism that enables a robot to autonomously collect precisely spatially-aligned visual and tactile image pairs, then train visual and tactile encoders to embed these pairs into a shared latent space using cross-modal contrastive loss. We apply this latent space to downstream perception and control of deformable garments on flat surfaces, and evaluate the flexibility of the learned representations without fine-tuning on 5 tasks: feature classification, contact localization, anomaly detection, feature search from a visual query (e.g., garment feature localization under occlusion), and edge following along cloth edges. The pretrained representations achieve a 73-100% success rate on these 5 tasks.

artificial intelligence, machine learning, visual image, (15 more...)

arXiv.org Artificial Intelligence

2209.13042

Country:

North America > United States (0.14)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer

Han, Yunhai, Yu, Kelin, Batra, Rahul, Boyd, Nathan, Mehta, Chaitanya, Zhao, Tuo, She, Yu, Hutchinson, Seth, Zhao, Ye

arXiv.org Artificial IntelligenceJul-23-2023

Reliable robotic grasping, especially with deformable objects such as fruits, remains a challenging task due to underactuated contact interactions with a gripper, unknown object dynamics and geometries. In this study, we propose a Transformer-based robotic grasping framework for rigid grippers that leverage tactile and visual information for safe object grasping. Specifically, the Transformer models learn physical feature embeddings with sensor feedback through performing two pre-defined explorative actions (pinching and sliding) and predict a grasping outcome through a multilayer perceptron (MLP) with a given grasping strength. Using these predictions, the gripper predicts a safe grasping strength via inference. Compared with convolutional recurrent networks, the Transformer models can capture the long-term dependencies across the image sequences and process spatial-temporal features simultaneously. We first benchmark the Transformer models on a public dataset for slip detection. Following that, we show that the Transformer models outperform a CNN+LSTM model in terms of grasping accuracy and computational efficiency. We also collect a new fruit grasping dataset and conduct online grasping experiments using the proposed framework for both seen and unseen fruits. {In addition, we extend our model to objects with different shapes and demonstrate the effectiveness of our pre-trained model trained on our large-scale fruit dataset. Our codes and dataset are public on GitHub.

artificial intelligence, machine learning, transformer model, (17 more...)

arXiv.org Artificial Intelligence

2112.06374

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning and Consumer Data

Chang, Hannah H., Mukherjee, Anirban

arXiv.org Artificial IntelligenceJun-24-2023

The digital revolution has led to the digitization of human behavior, creating unprecedented opportunities to understand observable actions on an unmatched scale. Emerging phenomena such as crowdfunding and crowdsourcing have further illuminated consumer behavior while also introducing new behavioral patterns. However, the sheer volume and complexity of this data present significant challenges for marketing researchers and practitioners. Traditional methods used to analyze consumer data fall short in handling the breadth, precision, and scale of emerging data sources. To address this, computational methods have been developed to manage the "big data" associated with consumer behavior, which typically includes structured data, textual data, audial data, and visual data. These methods, particularly machine learning, allow for effective parsing and processing of multi-faceted data. Given these recent developments, this review article seeks to familiarize researchers and practitioners with new data sources and analysis techniques for studying consumer behavior at scale. It serves as an introduction to the application of computational social science in understanding and leveraging publicly available consumer data.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.14118

Country:

Asia > Singapore (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Overview (1.00)

Industry:

Media (1.00)
Banking & Finance (1.00)
Information Technology (0.95)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Multimodal Zero-Shot Learning for Tactile Texture Recognition

Cao, Guanqun, Jiang, Jiaqi, Bollegala, Danushka, Li, Min, Luo, Shan

arXiv.org Artificial IntelligenceJun-22-2023

Tactile sensing plays an irreplaceable role in robotic material recognition. It enables robots to distinguish material properties such as their local geometry and textures, especially for materials like textiles. However, most tactile recognition methods can only classify known materials that have been touched and trained with tactile data, yet cannot classify unknown materials that are not trained with tactile data. To solve this problem, we propose a tactile zero-shot learning framework to recognise unknown materials when they are touched for the first time without requiring training tactile samples. The visual modality, providing tactile cues from sight, and semantic attributes, giving high-level characteristics, are combined together to bridge the gap between touched classes and untouched classes. A generative model is learnt to synthesise tactile features according to corresponding visual images and semantic embeddings, and then a classifier can be trained using the synthesised tactile features of untouched materials for zero-shot recognition. Extensive experiments demonstrate that our proposed multimodal generative model can achieve a high recognition accuracy of 83.06% in classifying materials that were not touched before. The robotic experiment demo and the dataset are available at https://sites.google.com/view/multimodalzsl.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.12705

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Shaanxi Province > Xi'an (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

Controllable Visual-Tactile Synthesis

Gao, Ruihan, Yuan, Wenzhen, Zhu, Jun-Yan

arXiv.org Artificial IntelligenceMay-4-2023

Deep generative models have various content creation applications such as graphic design, e-commerce, and virtual Try-on. However, current works mainly focus on synthesizing realistic visual outputs, often ignoring other sensory modalities, such as touch, which limits physical interaction with users. In this work, we leverage deep generative models to create a multi-sensory experience where users can touch and see the synthesized object when sliding their fingers on a haptic surface. The main challenges lie in the significant scale discrepancy between vision and touch sensing and the lack of explicit mapping from touch sensing data to a haptic rendering device. To bridge this gap, we collect high-resolution tactile data with a GelSight sensor and create a new visuotactile clothing dataset. We then develop a conditional generative model that synthesizes both visual and tactile outputs from a single sketch. We evaluate our method regarding image quality and tactile rendering accuracy. Finally, we introduce a pipeline to render high-quality visual and tactile outputs on an electroadhesion-based haptic device for an immersive experience, allowing for challenging materials and editable sketch inputs.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.03051

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.81)

Industry: Information Technology > Services > e-Commerce Services (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Vis2Hap: Vision-based Haptic Rendering by Cross-modal Generation

Cao, Guanqun, Jiang, Jiaqi, Mao, Ningtao, Bollegala, Danushka, Li, Min, Luo, Shan

arXiv.org Artificial IntelligenceJan-17-2023

To assist robots in teleoperation tasks, haptic rendering which allows human operators access a virtual touch feeling has been developed in recent years. Most previous haptic rendering methods strongly rely on data collected by tactile sensors. However, tactile data is not widely available for robots due to their limited reachable space and the restrictions of tactile sensors. To eliminate the need for tactile data, in this paper we propose a novel method named as Vis2Hap to generate haptic rendering from visual inputs that can be obtained from a distance without physical interaction. We take the surface texture of objects as key cues to be conveyed to the human operator. To this end, a generative model is designed to simulate the roughness and slipperiness of the object's surface. To embed haptic cues in Vis2Hap, we use height maps from tactile sensors and spectrograms from friction coefficients as the intermediate outputs of the generative model. Once Vis2Hap is trained, it can be used to generate height maps and spectrograms of new surface textures, from which a friction image can be obtained and displayed on a haptic display. The user study demonstrates that our proposed Vis2Hap method enables users to access a realistic haptic feeling similar to that of physical objects. The proposed vision-based haptic rendering has the potential to enhance human operators' perception of the remote environment and facilitate robotic manipulation.

artificial intelligence, friction coefficient, haptic rendering, (16 more...)

arXiv.org Artificial Intelligence

2301.06826

Country:

Europe > United Kingdom (0.14)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback